Frequency-domain Linear Prediction for Temporal Features
نویسندگان
چکیده
Current speech recognition systems uniformly employ short-time spectral analysis, usually over windows of 1030 ms, as the basis for their acoustic representations. Any detail below this timescale is lost, and even temporal structure above this level is usually only weakly represented in the form of deltas etc. We address this limitation by proposing a novel representation of the temporal envelope in different frequency bands by exploring the dual of conventional linear prediction (LPC) when applied in the transform domain. With this technique of frequency-domain linear prediction (FDLP), the ‘poles’ of the model describe temporal, rather than spectral, peaks. By using analysis windows on the order of hundreds of milliseconds, the procedure automatically decides how to distribute poles to best model the temporal structure within the window. While this approach offers many possibilities for novel speech features, we experiment with one particular form, an index describing the ‘sharpness’ of individual poles within a window, and show a large relative word error rate improvement from 4.97% to 3.81% in a recognizer trained on general conversational telephone speech and tested on a small-vocabulary spontaneous numbers task. We analyze this improvement in terms of the confusion matrices and suggest how the newlymodeled fine temporal structure may be helping.
منابع مشابه
Temporal resolution analysis in frequency domain linear prediction.
Frequency domain linear prediction (FDLP) is a technique for auto-regressive modeling of Hilbert envelopes. In this letter, the resolution properties of the FDLP model are investigated using synthetic signals with impulses immersed in noise. The effect of various factors are studied which affect the temporal resolution and this analysis suggests ways to improve the resolution of the FDLP envelo...
متن کاملPhoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملA Novel Temporal-Frequency Domain Error Concealment Method for Motion Jpeg
Motion-JPEG is a common video format for compression of motion images with highquality using JPEG standard for each frame of the video. During transmission through a noisychannel some blocks of data are lost or corrupted, and the quality of decompression frames decreased.In this paper, for reconstruction of these blocks, several temporal-domain, spatial-domain, andfrequency-domain error conceal...
متن کاملTime-Varying Autoregressions for Speaker Verification in Reverberant Conditions
In poor room acoustics conditions, speech signals received by a microphone might become corrupted by the signals’ delayed versions that are reflected from the room surfaces (e.g. wall, floor). This phenomenon, reverberation, drops the accuracy of automatic speaker verification systems by causing mismatch between the training and testing. Since reverberation causes temporal smearing to the signa...
متن کاملPLP 2 Autoregressive modeling of auditory - like 2 - D spectro - temporal patterns
The temporal trajectories of the spectral energy in auditory critical bands over 250 ms segments are approximated by an all-pole model, the time-domain dual of conventional linear prediction. This quarter-second auditory spectro-temporal pattern is further smoothed by iterative alternation of spectral and temporal all-pole modeling. Just as Perceptual Linear Prediction (PLP) uses an autoregress...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003